Plastome evolution and phylogeny of the tribe Ruteae (Rutaceae)

Abstract Rutaceae is a large family, and the genus‐level classification in the subfamilies or tribes of this family is not unified based on different taxonomic treatments. Until now, phylogenetic relationships of some genera in traditional tribe Ruteae have not been clearly resolved. In this study, seven new complete plastomes of this tribe were sequenced, and a comparative analysis was performed to investigate their plastome characteristics and evolution. In addition, we inferred the phylogenetic relationships of Ruteae based on complete plastome and nuclear ITS data. All plastomes exhibited a typical quadripartite structure and were relatively conserved in their structure and gene arrangement. Their genome sizes ranged from 154,656 bp to 160,677 bp, and the size variation was found to be associated with differences in IR expansion and gene loss. A total of 112 to 114 genes were identified in the genomes, including 78 to 79 protein‐coding genes, 30 tRNA genes, 4 rRNA genes, and 2 pseudogenes. Sequence divergence analysis indicated that non‐coding regions exhibited a higher percentage of variable characters, and nine non‐coding and six coding regions were identified as divergent hotspots. Phylogenetic results based on different datasets showed that this tribe was divided into three reciprocally exclusive groups. The phylogenetic analyses between plastome and nuclear ITS data were partly incongruent with each other. This study provides new insights into plastome evolution of Ruteae as well as Rutaceae. The availability of these plastomes provides useful genomic resources for molecular DNA barcodes and phylogenetically informative markers and deepens our understanding of the phylogeny in Ruteae.


| INTRODUC TI ON
Rutaceae is a large family (c. 154 genera and 2100 species) with great diversity in morphological characters and a nearly worldwide distribution (Appelhans et al., 2021;Groppo et al., 2022;Kubitzki et al., 2011;Zhang et al., 2008). This family comprises tall or short trees, shrubs or climbers, and rarely herbs. The relationships among the genera within the family are very complicated, and there have been considerably different taxonomic treatments (Appelhans et al., 2021;Chase et al., 1999;Groppo et al., 2008Groppo et al., , 2012Hartley, 1981;Kubitzki et al., 2011;Morton & Telmer, 2014;Poon et al., 2007;Scott et al., 2000;Waterman, 1975). The first comprehensive classification of the Rutaceae was undertaken by Engler (1896Engler ( , 1931, in which 7 subfamilies, 12 tribes, and several subtribes were proposed. Among them, tribe Ruteae in the subfam- and Dictamnus L. Among them, Ruta includes seven to nine species of strongly scented herbs or half-shrubs with a distribution from the Mediterranean region to southwest Asia (Kubitzki et al., 2011;Salvo et al., 2008). Haplophyllum was first proposed by Jussieu (1825) but placed in Ruta as a subgenus by Engler (1931). In later systematic works, the genus Haplophyllum was reinstated based on morphological and phytochemical features (Navarro et al., 2004;Townsend, 1986). It includes 66-68 species of herbs and half-shrubs distributed in the western Mediterranean, northern Africa, Arabia, Central Asia, and China (Salvo et al., 2011). Boenninghausenia consists of one to three species of strongly scented herbs with a distribution from the Himalayas eastwards to Japan and southwards to Java and Lesser Sunda Islands (Kubitzki et al., 2011). Both Psilopeganum and Cneoridium are monotypic genera; the former is endemic to the narrow area of Central and Southwest China (Zhang et al., 2008), and the latter is restricted to southwestern North America. Thamnosma consists of eight shrubs or subshrubs species that occur in the southwestern United States, Somalia, and southern Africa. Dictamnus probably includes a single polymorphic species of strongly scented herbs widely distributed from warm-temperate Europe through temperate Asia to North China (Kubitzki et al., 2011).
The traditional Ruteae was defined mainly by morphological characters, life forms, and habitat. However, generic taxonomy and relationship in this tribe are controversial (Gottlieb & Ehrendorfer, 1988;Kubitzki et al., 2011;Moore, 1936;Townsend, 1986). In the past decade or so, molecular sequence data have provided important insights into the intergeneric relationships of tribe Ruteae and related taxa. An early phylogenetic study using three plastid markers (Salvo et al., 2008) found that Haplophyllum and Ruta formed reciprocally exclusive monophyletic groups and that Dictamnus was not closely related to the other genera of Ruteae. A later study based on the same plastid markers and two nuclear markers was the first to include the genus Psilopeganum in the phylogenetic analysis of Ruteae (Appelhans et al., 2016). The results showed that Psilopeganum belonged to a group with Boenninghausenia, Ruta, and Thamnosma, but its exact position remained unclear. This study also showed that Haplophyllum and Cneoridium had a closer relationship with the subfamily Aurantioideae than with other genera of Ruteae. In the recent study on subfamily classification of Rutaceae, the genera of traditional Ruteae were placed into four subfamilies (Appelhans et al., 2021). Boenninghausenia, Psilopeganum, Ruta, Thamnosma, and another genus Chloroxylon DC. composed the subfamily Rutoideae.
However, Haplophyllum, Cneoridium, and Dictamnus belonged to the subfamilies Haplophylloideae, Amyridoideae, and Zanthoxyloideae, respectively. Although different taxa of Ruteae and related genera were sampled in these phylogenetic studies, only a limited number of genetic markers were used to infer the phylogeny, and the resolution of some core nodes in the phylogenetic trees was low or lacking.
Therefore, additional genetic characters are essential for resolving the phylogenetic relationships among the genera of Ruteae and related taxa.
Chloroplast genomes (plastomes) are usually circular DNA molecules composed of a large single copy (LSC) region and a small single copy (SSC) region separated by a pair of inverted repeat (IR) regions. Plastome is independent of nuclear genome and has highly conserved gene content, number, and structure (Barrett et al., 2014;Daniell et al., 2016). Over the past years, it has been used to elucidate plant molecular evolution, enhancing our understanding of chloroplast biology, intracellular gene transfer, conservation, and diversity (Daniell et al., 2016). Moreover, plastome sequences have been widely used to infer phylogeny relationships at different taxonomic levels and are sometimes used as DNA barcodes for species identification (Dong et al., 2012(Dong et al., , 2021(Dong et al., , 2022Jansen et al., 2007).
In recent years, the large number of complete plastome sequences has provided helpful data for modern taxonomy and phylogenetics and provided a higher resolution of evolutionary relationships within phylogenetic clades compared with small numbers of plastid or nuclear markers (Jansen & Ruhlman, 2012;Ma et al., 2014;Ruhfel et al., 2014;Daniell et al., 2016;Foster et al., 2018;Song et al., 2020).
To date, plastome characteristics and evolution in Ruteae at the generic level are not clear, and it is not available using the complete plastome data to infer the phylogeny of this tribe. In the present study, we first sequenced the complete plastomes of seven species in six genera (Psilopeganum, Boenninghausenia, Thamnosma, Haplophyllum, Cneoridium, and Dictamnus) and added the plastome of Ruta graveolens L. obtained from GenBank for comparative analyses. We attempted to reveal the plastome characteristics and evolution of the seven genera and provide useful plastid genomic resources for molecular DNA barcodes and phylogenetically informative markers. We then performed a phylogenomic analysis based on complete plastomes from 8 species of Ruteae, 15 other Rutaceae species, and 3 outgroup species from Simaroubaceae. In addition, we extended sampling and added the published plastid markers and nuclear internal transcribed spacer (ITS) sequences to construct the phylogenetic relationships of this tribe and related taxa to improve our understanding of the Ruteae phylogeny based on different data.

| Genomes assembly and annotations
Complete plastome and ITS sequences of the seven species were assembled using GetOrganelle v1.6.2d with default parameters . The GetOrganelle first filtered plastid-like reads, conducted the de novo assembly, purified the assembly, and finally, generated the complete plastid genomes (Bankevich et al., 2012;Camacho et al., 2009;Langmead & Salzberg, 2012). K-mer gradients for a mean and maximum 150 bp reads were set as "-k 21, 45, 65, 85,105" for all species. Bandage (Wick et al., 2015) was used to visualize the final assembly graphs to authenticate the automatically generated plastid genome. Gene annotation was conducted using Plann (Huang & Cronk, 2015) with the annotated plastome of Ruta graveolens (GenBank accession NC_045946) as the reference genome.

| Characterization of repeat sequences
Repeat sequences were analyzed to calculate the content of dispersed and tandem repeats. Dispersed repeats were detected following Lee et al. (2020). Tandem Repeats Finder version 4.09 (Benson, 1999) was used to identify tandem repeats in the plastomes with default alignment parameters of match, mismatch, and insertions and deletions (indels) of 2, 7, and 7, respectively. The proportion of total repeats (length of repeats/length of plastome [IRa excluded]) was calculated for each plastome. Simple sequence repeats (SSRs) were detected using the PERL script MicroSAtellite (MISA) (Thiel et al., 2003) with a size motif of one to six nucleotides and a threshold of 10, 5, 4, 3, and 3 for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide, respectively.

| Genome comparison and nucleotide divergence
The mVISTA program (http://genome.lbl.gov/vista/ mvist a/about. shtml) in the Shuffle-LAGAN mode (Frazer et al., 2004) was used with the R. graveolens annotation as a reference to compare the plastomes. To screen for polymorphic hotspots, 77 shared protein-coding genes (coding regions) and 107 intergenic spacer regions and introns (non-coding regions) of the plastomes were extracted separately.
All the regions were aligned using MAFFT (Katoh et al., 2019) and manually adjusted using MEGA 7 (Kumar et al., 2016). The aligned sequence length, number of variable sites, and number of parsimony informative characters (PICs) for each region were calculated using DnaSP 5.0 (Librado & Rozas, 2009). The variable sites include singleton and parsimony information sites. We used two methods to identify the mutation hotspots. The first was based on the percentage of variable sites, which calculated all the single-nucleotide substitutions and included the most divergence markers. The second was based on the percentage of PICs, which included the most informative markers (Ma et al., 2018;Meiklejohn et al., 2016).

| Phylogenetic analysis
To infer the phylogeny of Ruteae, three datasets were created. The first dataset CCG included 23 complete plastomes (7 newly generated and 16 obtained from GenBank) from 20 genera of Rutaceae, plus three outgroup sequences from Simaroubaceae (Appendix A and B). The second dataset CCG + PM combined CCG and the published sequences of plastid markers (matK, rpl16, and trnL-trnF) from 27 species of traditional Ruteae (Appendix C), representing as many taxa and individuals of this tribe as possible. In addition, we also included the available matK and trnL-trnF sequences of Chloroxylon in this dataset to test its phylogenetic relationship with regard to Ruteae (Appendix C). The third ITS dataset included 24 nuclear ITS sequences (7 newly created with the complete plastomes and 17 obtained from GenBank which are the same species but not the same samples as the plastomes) (Appendix B). All the sequences were aligned with MAFFT (Katoh et al., 2019), and ambiguous alignment regions were trimmed by Geneious 2021 with default parameters (Sites containing: Gaps = 20%) (Ripma et al., 2014).
The best-fit model of nucleotide substitution was selected using the ModelFinder program (Kalyaanamoorthy et al., 2017) under the Bayesian information criterion. Maximum-likelihood (ML) trees were inferred with the best GTR + F + R4 model from ModelFinder using the IQtree v1.6.12 (Nguyen et al., 2015), and 1000 bootstrap replicates were tested to evaluate the branch support values. The support values were marked in the strict tree (the consensus tree).
Bayesian inference (BI) was executed with MrBayes v3.2.3 (Ronquist et al., 2012) in the GTR + G model, set to run for 2 million generations with four chains, sampled every 100 generations, with all other settings left at their defaults and 25% of the trees discarded as burn-in.

| Features of plastomes
However, the eight species exhibited obvious variation in plastome size: the largest was C. dumosum (160,677 bp) and the smallest was T.

| Repeats and SSR analysis
Repeated sequences have played key roles in plastome rearrangement, divergence, and evolution, and SSRs have been used extensively in population genetics and molecular identification (Guisinger et al., 2010). Repeat analyses revealed that the repetitive DNA has minor variation among the eight plastomes ( Table 2). The highest proportion of dispersed repeats were identified in H. dauricum (1360 bp, 1.1%), followed by C. dumosum (1216 bp, 0.9%). They were slightly higher than the other six species ( Table 2). The greatest amount and proportion of tandem repeats were identified also in H. dauricum (3657 bp, 2.8%), followed by C. dumosum (1867 bp, 1.4%).

| IR contraction and expansion
The IR regions of the eight plastomes ranged in length from 26,320 bp

| Plastome sequence divergence
To characterize genome divergence, multiple sequence alignments were performed between the eight plastomes using mVISTA. The comparison demonstrated that these plastomes had high consistency in gene arrangement (Figure 4). The eight plastomes dataset had an aligned length of 169,666 bp, with 16,354 variable sites (9.64%) and 6263 parsimony informative characters (PICs, 3.69%) ( Table 4). The distribution pattern of variable sites varied greatly between coding and non-coding regions and among LSC, SSC, and IR regions. The average variable percentages of coding regions and non-coding regions were 8.18% and 10.82%, respectively. The SSC region exhibited the highest variation (17.80%), followed by the LSC region (11.93%), and the IR region (only 2.50%) ( Table 4). The PICs percentages of coding regions and non-coding regions were 3.12% and 4.15%, respectively. The SSC region had the highest PICs percentage (6.93%), followed by the LSC region (4.59%), and the IR region (only 0.88%) ( Table 4).

| Phylogenetic analysis
BI and ML analyses based on CCG dataset and CCG + PM dataset indicated similar tree topology (Figures 6 and 7). The results showed that the traditional tribe Ruteae was polyphyletic and was split TA B L E 2 Statistics of dispersed and tandem repeats in the eight plastomes.
The phylogenetic tree based on nuclear ITS dataset (Figure 8) showed that the four genera of Ruta, Psilopeganum, Boenninghausenia, and Thamnosma formed a monophyletic group with high support, and their relationships were consistent with the plastome tree.
The significant incongruence between the two datasets involved the relationships of Haplophyllum and Cneoridium. In the ITS tree, Haplophyllum was sister to the clade of Cneoridium-Aurantioideae and Zanthoxyloideae-, which was very different from the plastome tree.
In addition, in the ITS tree, Dictamnus was sister to Orixa-Casimiroa, but in the plastome tree, Casimiroa was sister to Dictamnus-Orixa.
However, the support values in these topological incongruence nodes were very low.

F I G U R E 3
Comparison of the borders of LSC, SSC, and IR regions among the eight plastomes. Different color-specific genes.

| Plastomes characteristics
We sequenced seven plastomes and compared eight plastomes representing all genera of the traditional tribe Ruteae in order to elucidate their plastome evolution. Their genome structures and gene orders were fairly conserved. All the genomes had a typical quadripartite structure composed of one LSC, one SSC, and two IR regions, and there were no rearrangements in genome organization (Figures 1 and 4). However, there was variation in gene content and gene number. GC content is an important genomic and systematic character that describes the nucleotide proportions in the genome.
When GC content in the genome is greater, it has a greater density of DNA bases, making the sequence more stable and difficult to mutate . The overall GC content of the eight plastomes varied from 38.3% to 39.1% (Table 1) and was clearly lower than the AT content. The IR regions showed higher GC contents due to the abundance of rRNA and tRNA (Li et al., 2016).
Gene loss often occurs during the evolution of angiosperms plastomes (Feng et al., 2020;Lei et al., 2016). Among the eight plastomes examined here, the ycf15 gene was lost in H. dauricum and D.  (Gao, 2017). The infA gene encodes the translation initiation factor 1 and is regarded as one of the most mobile genes, as it has been transferred many times between the cp and nuclear genomes in plants (Millen et al., 2001).
Using the clean data, we mapped the reads to the sequence of infA gene. We found a homologous copy in P. silopeganum and the sequence had high divergence with the plastome copy, indicating that this gene has transferred to the nuclear or mitochondrial genome.
However, this copy was not found in other species, indicating that infA may be completely lost.
The contraction and expansion of IRs generally lead to size variation in the plastomes of angiosperms and play important roles in genome evolution (Tanvi et al., 2017). Reduction of plastome size is caused mainly by IR contraction, gene loss, intron loss, or intergenic spacer loss (Dugas et al., 2015;Jansen et al., 2008). By contrast, increases in plastome size are usually caused by IR expansion and gene duplication (Dugas et al., 2015;Wang et al., 2018).

F I G U R E 5
Percentage of variable sites in aligned sequences of the eight plastomes. A: coding region; b: non-coding region.
change position from the LSC to the IRb region, and gene replication occurred in the IRa region ( Figure 3). IR expansion also led to the partial duplication of rpl22 and ycf1 in the boundary region, thereby producing two pseudogenes ( Figure 3).

| Molecular markers
Complete plastome sequences can provide vast sequence information for identifying different taxa and inferring their phylogenetic relationships (Daniell et al., 2016;Dong et al., 2012). Here, most sequence variation occurred in the non-coding sequences of the eight plastomes ( Figure 5; Table 4), indicating that the coding regions were more conserved than the non-coding ones, consistent with most angiosperm plastomes (Daniell et al., 2016;Feng et al., 2020). In particular, the IR regions were much less divergent than the LSC and SSC regions (Table 4). We identified nine non-coding and six coding regions with the highest percentage of variation as divergence hotspots (Table 5). These divergent regions may be undergoing rapid nucleotide substitution, indicating great potential molecular markers for phylogeny and evolutionary research in Ruteae as well as Rutaceae, especially at the generic level.
Previous phylogenetic studies of Ruteae and related genera have used coding genes (atpB, rbcL, rpl16, and matK) and non-coding markers (atpB-rbcL, trnL-trnF, rpl16 intron, and rps16 intron) (Appelhans et al., 2016(Appelhans et al., , 2021Chase et al., 1999;Groppo et al., 2008;Morton & Telmer, 2014;Salvo et al., 2008). We compared the sequence divergence between these markers and the current mutation hotspot regions. The percentages of variable sites and PICs in these markers are listed in Table 5. The results show that the four non-coding markers and two coding markers (atpB and rbcL) are much less variable than the mutation hotspot regions identified in this study, and other two coding markers, matK and rpl16, have higher percentages of variable sites and PICs.

| Phylogenetic analysis
We conducted phylogenomic analysis of the traditional tribe Ruteae and related taxa based on CCG dataset and CCG + PM dataset. For Clade 1, our results from both datasets showed that the core Ruteae consisted of Ruta, Psilopeganum, Boenninghausenia, and Thamnosma.
Within the core Ruteae, the CCG tree suggested that Ruta was the first divergent group and sister to the other three genera with high support. The results also indicated that Psilopeganum was sister to the TA B L E 5 Statistics for the most divergent hotspot regions identified in this study and the markers used in previous phylogenetic analyses (marked with asterisk). one is (Psilopeganum, (Ruta, (Boenninghausenia, Thamnosma))), and the other is (Ruta, (Psilopeganum, (Boenninghausenia, Thamnosma))).

Gene regions Type
Based on the floral development research in R. graveolens and P.
sinense, Wei et al. (2012) suggested that bicarpellate Psilopeganum most likely evolved from a Ruta-like tetracarpellate ancestor. Based on our plastome analysis and their anatomical features, the relationships among the four genera of the core Ruteae are more likely to be (Ruta, (Psilopeganum, (Boenninghausenia, Thamnosma))). Moreover, the phylogenetic tree has a short internode connected by long branches at the Ruta and Psilopeganum nodes, suggesting that this clade may have undergone rapid radiation.
Chloroxylon is a small genus of three tree species that has traditionally been placed in the Engler's subfamily Flindersioideae (Kubitzki et al., 2011). However, previous phylogenetic studies at the family level resolved this genus as a close relative of Ruta (Chase et al., 1999;Groppo et al., 2008;Morton & Telmer, 2014).
In the recent phylogenetic study, Chloroxylon, together with Boenninghausenia, Psilopeganum, Ruta, and Thamnosma, formed the Rutoideae subfamily and Chloroxylon was at the base of this subfamily (Appelhans et al., 2021). In our CCG + PM analysis, Chloroxylon belonged to the Ruteae tribe and was at the base of this tribe, consistent with the results of Appelhans et al. (2021). Considering that Chloroxylon is distributed in Madagascar, India, and Sri Lanka, this relationship suggested that the shrubby/herbaceous Ruteae may have a woody origin and an austral link through Chloroxylon.

F I G U R E 6
Phylogenetic reconstruction of Rutaceae and outgroup species using Bayesian inference (BI) and maximum-likelihood (ML) methods based on CCG dataset. The support values for the nodes are for posterior probabilities (PP) and bootstrap support (BS). Tribes (bars) are based on Engler (1931) and subfamilies (bars) are based on Appelhans et al. (2021). The newly sampled species' name was marked in bold.

Several studies based on different plastid markers found that
Dictamnus had a closer relationship with the genera Skimmia Thunb. and/or Casimiroa (Appelhans et al., 2021;Chase et al., 1999;Groppo et al., 2008;Morton & Telmer, 2014). Due to the incongruence between morphological characters and molecular analyses of these three genera, Kubitzki et al. (2011) (Kubitzki et al., 2011;Zhang et al., 2008). The fruit and seed characters provide morphological support for the closer relationship between Dictamnus and Orixa inferred from our data, although whether these are synapomorphies will require further investigation.
Phylogenetic results between plastome and ITS datasets were partly incongruent with each other. The ITS tree confirmed the monophyletic group of the core Ruteae and supported the relationships of (Ruta, (Psilopeganum, (Boenninghausenia, Thamnosma))).
However, there were conflicts concerning the phylogenetic positions of Haplophyllum and Dictamnus. Extensive incongruent patterns have been found between cp and nuclear data, most possibly caused by hybridization/introgression, incomplete lineage sorting (ILS), rapid radiation, and reticulate evolution (Degnan & Rosenberg, 2009;Dong et al., 2022;Lin et al., 2019;Pelser et al., 2010). In addition, sampling biases, inadequate nucleotide sequence data, and horizontal gene transfer (HGT) also have important impacts on incongruence (Meng et al., 2021;Soltis & Kuzoff, 1995). In this study, it is not possible to determine the underlying causes of the observed incongruences at the generic level and further study is needed. It is worth noting that most of the support values in the ITS tree were low, which was probably due to the inadequate variations provided by a limited number of F I G U R E 8 Phylogenetic reconstruction of Ruteae using Bayesian inference (BI) and maximum-likelihood (ML) methods based on ITS dataset. The support values for the nodes are for posterior probabilities (PP) and bootstrap support (BS). Tribes and subfamilies (bars) are same as in Figure 6. The newly sampled species' name was marked in bold.
DNA loci. Therefore, our results highlighted the advantage of using complete plastomes with more informative sites in resolving phylogenetic relationships of tribe Ruteae.

| CON CLUS IONS
This study investigated the characteristics and evolution of eight plastomes, representing all the genera of the traditional tribe Ruteae (Rutaceae). The results revealed that the genome structure and gene order were highly conserved among these genera. However, the genome size exhibited a greater difference, which was related to both IR expansion and gene loss. The sequence divergence and mutation hotspots of the plastomes were analyzed. Nine non-coding and six coding regions had the highest divergence and may be considered useful molecular barcodes and phylogenetic markers. The phylogenetic analysis based on different datasets confirmed the core Ruteae and supported separating traditional Ruteae into three reciprocally exclusive groups. This study revealed new insights into the plastome evolution in Ruteae as well as in Rutaceae, and indicated that the cp genomic data with obviously more informative sites could provide a deeper understanding of the phylogeny within Ruteae and related taxa.

ACK N OWLED G M ENTS
We thank the DNA Bank of China, Institute of Botany, Chinese Academy of Sciences, for providing some materials. We also would like to thank Ao Wang and Yachao Wang for their kind help in sample collection and technical support.

This research was supported by the Science and Technology Basic
Resources Investigation Program of China "Survey and Germplasm

Conservation of Plant Species with Extremely Small Populations in
South-west China" (Grant No. 2017FY100100).

CO N FLI C T O F I NTE R E S T
The authors declare that there is no conflict of interest.